attack performance
Adversarial Attacks on Black Box Video Classifiers: Leveraging the Power of Geometric Transformations (Supplementary Material)
We observe that our method outperforms the baseline methods in a statistically significant way. We consider four state-of-the-art video classification models, representing diverse methodologies of learning from videos, i.e., C3D [1], SlowFast [2], TPN [3] and I3D [4], as our black-box victim models to perform adversarial attack. The C3D model applies 3D convolution to learn spatio-temporal features from videos. SlowFast uses a two-pathway architecture where the slow pathway operates at a low frame rate to capture spatial semantics and the fast pathway operates at a high frame rate to capture motion at fine temporal resolution. I3D proposes the Inflated 3DConvNet(I3D) with Inflated 2D filters and pooling kernels of traditional 2DCNNs.
Boosting the Transferability of Adversarial Attack on Vision Transformer with Adaptive Token Tuning
Vision transformers (ViTs) perform exceptionally well in various computer vision tasks but remain vulnerable to adversarial attacks. Recent studies have shown that the transferability of adversarial examples exists for CNNs, and the same holds true for ViTs. However, existing ViT attacks aggressively regularize the largest token gradients to exact zero within each layer of the surrogate model, overlooking the interactions between layers, which limits their transferability in attacking black-box models. Therefore, in this paper, we focus on boosting the transferability of adversarial attacks on ViTs through adaptive token tuning (ATT). Specifically, we propose three optimization strategies: an adaptive gradient re-scaling strategy to reduce the overall variance of token gradients, a self-paced patch out strategy to enhance the diversity of input tokens, and a hybrid token gradient truncation strategy to weaken the effectiveness of attention mechanism.
BadTrack: A Poison-Only Backdoor Attack on Visual Object Tracking Bin Huang 1 Jiaqian Y u
Visual object tracking (VOT) is one of the most fundamental tasks in computer vision community. State-of-the-art VOT trackers extract positive and negative examples that are used to guide the tracker to distinguish the object from the background. In this paper, we show that this characteristic can be exploited to introduce new threats and hence propose a simple yet effective poison-only backdoor attack.
DarkSAM: Fooling Segment Anything Model to Segment Nothing Ziqi Zhou 1,2,3, Y ufei Song
Segment Anything Model (SAM) has recently gained much attention for its outstanding generalization to unseen data and tasks. Despite its promising prospect, the vulnerabilities of SAM, especially to universal adversarial perturbation (UAP) have not been thoroughly investigated yet. In this paper, we propose Dark-SAM, the first prompt-free universal attack framework against SAM, including a semantic decoupling-based spatial attack and a texture distortion-based frequency attack. We first divide the output of SAM into foreground and background. Then, we design a shadow target strategy to obtain the semantic blueprint of the image as the attack target.